Alignment of RNA with Structures of Unlimited Complexity

نویسندگان

  • Alessandro Dal Palù
  • Mathias Möhl
  • Sebastian Will
چکیده

Sequence-structure alignment of RNA with arbitrary secondary structure is Max-SNP-hard. Therefore, the problem of RNA alignment is commonly restricted to nested structure, where dynamic programming yields efficient solutions. However, nested structure cannot model pseudoknots or even more complex structural dependencies. Nevertheless those dependencies are essential and conserved features of many RNAs. Only a few existing approaches deal with crossing structures. Here, we present a constraint approach for alignment of structures in the even more general class of unlimited structures. Our central contribution is a new RNA alignment constraint propagator. It is based on an efficient O(n) relaxation of the RNA alignment problem. Our constraint-based approach Carna solves the alignment problem for sequences with given input structures of unlimited complexity. Carna is implemented using Gecode. In the post-genomic era, biologists get more and more interested in studying non-coding RNA molecules with catalytic and regulatory activity as central players in biological systems. The computational analysis of non-coding RNA requires to take structural information into account. Whereas RNAs form three-dimensional structures, structural analysis of RNA is usually concerned with the secondary structure of an RNA, i.e. the set of RNA base pairs (i, j) that form contacts (H-bonds) between the bases i and j. The RNA alignment problem is to align two RNA sequencesA andB with given secondary structure for each RNA such that a score based on sequence and structure similarity is optimized. The difficulty of this problem depends on the complexity of the RNA structures. Therefore, a complexity hierarchy of RNA structures was introduced. Most RNA analysis is performed for the class of nested structures P , where base-pairs do not cross, because for this class one can find efficient dynamic programming algorithms for structure prediction and alignment under reasonable scoring schemes [12, 5]. The more general class of crossing RNA structures P restricts the degree of base pairing to at most one, as is commonly assumed for single RNA structure. Prediction and alignment in this class is NP-hard in general [2]. However, one can devise a number of algorithms that efficiently predict or align RNAs with structures from classes in between non-crossing and arbitrary crossing [9, 8, 7]. However these algorithms have complexities that limit their application range. Other approaches for RNA alignment handle crossing structures with parametrized complexity, were the parameter captures the complexity of the structures [6]. Finally, the ILP approach Lara [1] computes alignments of arbitrarily complex crossing structures and appears to be more effective than dynamic programming based approaches. The success of this AI technique was a strong motivation for this work, where we study the alignment of RNAs with structures of unlimited complexity using constraint programming. Contribution We devise a constraint algorithm for the problem of aligning two RNA molecules with given sequences and unlimited secondary structures. By modeling and propagating constraints on integers, the method goes beyond rephrasing the ILP approach [1] in CP. We describe the constraint model, develop a new RNA alignment propagator, and present a specific search strategy. It is implemented using the Gecode constraint programming system. Finally, we apply our method to align both RNA molecules with given fixed structures and RNA molecules with associated base pair probability matrices. A.Dovier, A.Dal Palù, S.Will (eds.); WCB10; Volume 123, pp. 53–58 53 Alignment of RNA with Structures of Unlimited Complexity Dal Palù, Möhl and Will

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alignnment of RNA with Structures of Unlimited Complexity

Sequence-structure alignment of RNA with arbitrary secondary structure is Max-SNP-hard. Therefore, the problem of RNA alignment is commonly restricted to nested structure, where dynamic programming yields efficient solutions. However, nested structure cannot model pseudoknots or even more complex structural dependencies. Nevertheless those dependencies are essential and conserved features of ma...

متن کامل

A zero one programming model for RNA structures with arclength ≥ 4

In this paper, we consider RNA structures with arc-length 4 . First, we represent these structures as matrix models and zero-one linearprogramming problems. Then, we obtain an optimal solution for this problemusing an implicit enumeration method. The optimal solution corresponds toan RNA structure with the maximum number of hydrogen bonds.

متن کامل

Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions

We present a general setting for structure-sequence comparison in a large class of RNA structures that unifies and generalizes a number of recent works on specific families on structures. Our approach is based on tree decomposition of structures and gives rises to a general parameterized algorithm, where the exponential part of the complexity depends on the family of structures. For each of the...

متن کامل

RNA Secondary Structure Alignment Based on Stem Representation

The comparison methods for RNA or protein molecules are important and basic tools in molecular biology. So far, most comparison methods, such as sequence alignment, are only applicable to the primary structures of biomolecules. Indeed, the functions of biomolecules have close relationship in their structures. The RNA secondary structure alignment problem is to align two given RNA structures to ...

متن کامل

Average complexity of the Jiang-Wang-Zhang pairwise tree alignment algorithm and of a RNA secondary structure alignment algorithm

We prove that the average complexity of the pairwise ordered tree alignment algorithm of Jiang, Wang and Zhang is in O(nm), where n and m stand for the sizes of the two trees, respectively. We show that the same result holds for the average complexity of pairwise comparison of RNA secondary structures, using a set of biologically relevant operations.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010